schelling diagram
- North America > United States > California > San Francisco County > San Francisco (0.14)
- North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
- South America > Brazil > São Paulo (0.04)
- (6 more...)
SupplementaryMaterial
We provide additional results for EGTA applied to networked MARL system control for CPR management. Restraint percentages under different regeneration rates The heatmaps in Figure 7 (A-C) highlight the differences in restraint percentage for different values ofα as the regeneration rate is changed from high(0.1)to In the case where agents are completely self-interested (α = 0)shownin(A), themajority ofalgorithms without communication display verylowlevels of restraint for all rates of regeneration. The orange ovals in these diagrams indicate which system configurations correspond to the highest expected payofffor all agents. Schelling diagrams using a different parameterisation An alternative parameterisation for a Schelling diagram is to plot payoffs for a particular agent (cooperating or defecting) with respect to the number ofother cooperators on thex-axis, instead of thetotalnumber of cooperators.
- Europe > United Kingdom > England > Greater London > London (0.05)
- South America > Brazil > São Paulo (0.04)
- Oceania > Australia > Victoria > Melbourne (0.04)
- (7 more...)
- Leisure & Entertainment > Games (0.48)
- Food & Agriculture > Fishing (0.47)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.46)
- North America > United States > California > San Francisco County > San Francisco (0.14)
- North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
- South America > Brazil > São Paulo (0.04)
- (6 more...)
- Social Sector (0.75)
- Leisure & Entertainment > Games (0.68)
Supplementary Material
We provide additional results for EGT A applied to networked MARL system control for CPR management. Specifically, we investigate the consequence of different reward structures. Potential Nash equilibria are shaded in blue. NeurComm (across all values of α), which is likely due to its consensus update mechanism. The orange ovals in these diagrams indicate which system configurations correspond to the highest expected payoff for all agents.
Review for NeurIPS paper: A game-theoretic analysis of networked system control for common-pool resource management using multi-agent reinforcement learning
Weaknesses: - In multi-agent reinforcement learning research, Schelling diagrams are normally plotted as a function of the number of *other cooperators* (besides the focal agent making the decision), i.e. C - 1, rather than the total number of cooperators, C, as was done here. Either way is certainly correct in principle, Schelling said as much in the original 1973 paper. However, there are several reasons why the C - 1 parameterization is convenient. For instance, it lets you read off game theoretic properties from the diagram more easily. To see if cooperation or defection is favored for a particular number of other cooperators, you simply compare a point on the R_c curve to the point on the R_d curve that is right above it.
Inequity aversion improves cooperation in intertemporal social dilemmas
Hughes, Edward, Leibo, Joel Z., Phillips, Matthew, Tuyls, Karl, Dueñez-Guzman, Edgar, Castañeda, Antonio García, Dunning, Iain, Zhu, Tina, McKee, Kevin, Koster, Raphael, Roff, Heather, Graepel, Thore
Groups of humans are often able to find ways to cooperate with one another in complex, temporally extended social dilemmas. Models based on behavioral economics are only able to explain this phenomenon for unrealistic stateless matrix games. Recently, multi-agent reinforcement learning has been applied to generalize social dilemma problems to temporally and spatially extended Markov games. However, this has not yet generated an agent that learns to cooperate in social dilemmas as humans do. A key insight is that many, but not all, human individuals have inequity averse social preferences. This promotes a particular resolution of the matrix game social dilemma wherein inequity-averse individuals are personally pro-social and punish defectors. Here we extend this idea to Markov games and show that it promotes cooperation in several types of sequential social dilemma, via a profitable interaction with policy learnability. In particular, we find that inequity aversion improves temporal credit assignment for the important class of intertemporal social dilemmas. These results help explain how large-scale cooperation may emerge and persist.
- North America > United States > New York > New York County > New York City (0.14)
- North America > United States > California > San Francisco County > San Francisco (0.14)
- North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
- (11 more...)
- Social Sector (1.00)
- Leisure & Entertainment > Games (0.68)
Inequity aversion improves cooperation in intertemporal social dilemmas
Hughes, Edward, Leibo, Joel Z., Phillips, Matthew, Tuyls, Karl, Dueñez-Guzman, Edgar, Castañeda, Antonio García, Dunning, Iain, Zhu, Tina, McKee, Kevin, Koster, Raphael, Roff, Heather, Graepel, Thore
Groups of humans are often able to find ways to cooperate with one another in complex, temporally extended social dilemmas. Models based on behavioral economics are only able to explain this phenomenon for unrealistic stateless matrix games. Recently, multi-agent reinforcement learning has been applied to generalize social dilemma problems to temporally and spatially extended Markov games. However, this has not yet generated an agent that learns to cooperate in social dilemmas as humans do. A key insight is that many, but not all, human individuals have inequity averse social preferences. This promotes a particular resolution of the matrix game social dilemma wherein inequity-averse individuals are personally pro-social and punish defectors. Here we extend this idea to Markov games and show that it promotes cooperation in several types of sequential social dilemma, via a profitable interaction with policy learnability. In particular, we find that inequity aversion improves temporal credit assignment for the important class of intertemporal social dilemmas. These results help explain how large-scale cooperation may emerge and persist.
- North America > United States > California > San Francisco County > San Francisco (0.14)
- North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
- South America > Brazil > São Paulo (0.04)
- (6 more...)
- Social Sector (1.00)
- Leisure & Entertainment > Games (0.68)
Inequity aversion improves cooperation in intertemporal social dilemmas
Hughes, Edward, Leibo, Joel Z., Phillips, Matthew G., Tuyls, Karl, Duéñez-Guzmán, Edgar A., Castañeda, Antonio García, Dunning, Iain, Zhu, Tina, McKee, Kevin R., Koster, Raphael, Roff, Heather, Graepel, Thore
Groups of humans are often able to find ways to cooperate with one another in complex, temporally extended social dilemmas. Models based on behavioral economics are only able to explain this phenomenon for unrealistic stateless matrix games. Recently, multi-agent reinforcement learning has been applied to generalize social dilemma problems to temporally and spatially extended Markov games. However, this has not yet generated an agent that learns to cooperate in social dilemmas as humans do. A key insight is that many, but not all, human individuals have inequity averse social preferences. This promotes a particular resolution of the matrix game social dilemma wherein inequity-averse individuals are personally pro-social and punish defectors. Here we extend this idea to Markov games and show that it promotes cooperation in several types of sequential social dilemma, via a profitable interaction with policy learnability. In particular, we find that inequity aversion improves temporal credit assignment for the important class of intertemporal social dilemmas. These results help explain how large-scale cooperation may emerge and persist.
- North America > United States > New York > New York County > New York City (0.14)
- North America > United States > California > San Francisco County > San Francisco (0.14)
- North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
- (13 more...)
- Social Sector (1.00)
- Leisure & Entertainment > Games (0.93)